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Method of Identifying Glycan structures using Mass Spectrometer data 

Field of the Invention 

This invention relates to a method for characterising glycans and their, 
derivatives, also known as oligosaccharides. In particular, the invention is a method for 
identifying glycan structures by correlating experimentally determined mass 
spectrometer fragment data with data held in a database of glycan fragments. 



I- 



■ i 

Background of the Invention 

* . * ■ 

As used herein the term "glycan" will be used to describe both glycans and their 
derivatives unless otherwise indicated. It is known that it is possible to characterize 
glycan structures by correlating experimentally determined mass spectrometer fragment 
data with data held in a database of glycan fragments using manual interpretive 
methods. These methods involve researchers comparing the spectrometer fragment 
data with known fragment mass data. The problem with such manual, methods is that 
1 5 they are slow and time consuming. 

These problems essentially arise from the complexity of the glycan structures. 
The following list explains about glycans, with reference to Figs. 1 and 2 and defines 
words that will be used in the remainder of the specification: 

Stiucture - An oligosaccharide 1 consisting of monosaccharides 2 connected by 
glycosidic bonds 3. A set of independent monosaccharides can be arranged as a linked 
oligosaccharide through the glycosidic bonds between the monosaccharides. For the 
purposes of scoring oligosaccharide matches, the oligosaccharide is! defined as having a 
direction - that is a . particular monosaccharide is defined as a reducing end 
monosaccharide to which all other monosaccharides are either directly or indirectly 
attached. Each different arrangement of monosaccharides is ah isoform of the 
oligosaccharide. The maximum number of oligosaccharide sequences for an m 
monosaccharide oligosaccharide is given by the formula m m .This number is larger 
than the actual number of sequences that may be found in nature. Also 
monosaccharides may share the same mass, and so not all isoforms would be unique 

Reducing end - the end of the structure r that is not involved in glycosidic 
linkage. 

Non-reducing end - any end of the structure 4 that is not the reducing end of the 
structure. 

Edges - Are located between the monosaccharides. E 1( Ez, E 3 and E 4 are the 
35 edges for the structure shown in Fig. 1 . 

Depth - Is the distance of the edges from the reducing end, so in Fig. 1 : 
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depth (Ei )< depth (Ez )< depth (E 3 ) = depth (E 4 ), and. 
Ei has the highest rank. 

Cleavage - A carbon bond in the structure is broken. Cleavage may be- 
glycoside cross-ring and special. Fig. 2 shows cleavage at E 3 and E 4 . 

5 Glycoside cleavag e - A cleavage involving the breakage of the glycosidic 

bond. 

Cross-ring cleavage - A cleavage involving the breaking. . two of the carbon- 
carbon bonds in one of the carbon rings of a saccharide. 

Special cleavag e - A cleavage which is diagnostic^ significant, but does hot 
10 directly fall into the glycosidic or cross-ring categories. •" . 

Single cleavag e - A cleavage event that involves only a single glycosidic, cross- 
nng or special cleavage event, ie 1 -cleavage. 

Mldtipje^eavage - A cleavage event that involves more than , one cleavage 
event. Can be described as n-cleavage events, ie. 2-cleavage, 3-cleavage etc. Fig. 2 is 
15 an example of a 2-cleavage event. 

Fragment - A result of a single or multiple cleavage event. In Fig. 2 the 
fragments are 21, 22 and 23. - 

Di sjoint fra ^ ents - « fragments which do not have any common, 
monosaccharides. 

20 Reducing end frap ment - A fragment which contains the reducing end of the 

structure. Fragment 21 in Fig. 2. 

end of the structure. Both fragments 22 and 23 in Fig ; 2. 

25 -a q T ' Carb ° hydrate fr^tation Patterns are discussed in the article 

S^JJ TT ClatUTe f ° r Carboh y<frate Fragmentations in FAB-MS/MS 

Spectra of Glycoconjugates" by Bruno Domon and Catherine E Costello published in 
Glycoconjugate J (1988) 5: 397-409, the entire contents of which are incorporated 
herein by reference. "Domon and Costello" notation is the accepted norm for labelling 
glycan fragment ions and is used herein. 

30 Reducing end fragments may only be the result of particular types of cleavages 

For 1-cleavages, these are the Y, Z, X and certain special cleavage types. For n- 
cleavages, reducing end fragments only occur where there are no B,C or A cleavages 

rTld V/z °X Cl Tr ** ° CCUr - eXample ' eQd fr— include 

Y Z and Y/Z (Y and Z simultaneously) fragments. A B/Y fragment . cannot be a 
35 reducing end fragment. 
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Non reducing end fragments can result from combinations of cleavage types that 
only mclude a smgle non reducing cleavage type. It is not possible to creatf a^* 
from more than one non reducing cleavage type. ^gment 
Peak - A peak in an MS/MS spectrum. This peak has a mass to charge (m/z) and 
relauve xntensuy (relative to the largest peak in the spectrum) . 

Glycans may have numerous branch sites, indicated at 5 in Fig 1 on each 
monosaccharide, as well as i,nm^ ™a m 8 ' 

» wen as isomers and anomers. This results in Comdex 

fragmentat,on spectra in which the fragmen* „ bser ved may resuh from Z 
types of Ceavage, cleavage in different mentions and multiple cleavage 

CeavaglTTa fT"* * formation man 2- 

cleavages. For a 1-cleavage event, me oligosaccharide is spli, mt0 two parts one 

—ng me reducing end and me other containing me non-rlcing end J£ Z 
posstble to conclnstvely infer me composition of a comp.ementary 1-cieavage Za^L 
from the composition of a 1-c.eavage fragment, since the i^posiuon of^Tfm, 
ohgosacohande .s known. Reducing end ,-cieavage fragments ire ipeciaUy Lpori^ 
for seouencmg as the composition of the fragment containing d JZLT^t 

Zm.T y " A1S0> «" ™> 

known, the composruon of the non-reducing end fragment can be inferred fromVe 
20 ZZZT"- ^ - '- — - " 

2-cleavage events generally result in three possible fragment betas created The 
composmon of omy one of these fragments is ever fuUy chamTerised pXmol * 
posthon of me reducing end monosaccharide is omy disambiguated TZZ£ 

I &aSmei " " 3 redUdng ^ & ~ *« cueing end IcleZe 

results, the composition of each of the two "lost" fr.^,™.. . v cleava «= 

determined. Similarly for a multil L f ' tmambiguously 

„™ ■•• „ . multiple reducmg and non-reducing 2-deavaee the 

compostuons of the two complementary fragments from me main fa^ent Zo.te 
unambtgtjoua y determined. Since only me composition of 2 TZTTZ 
o^sacchande seen in a fragment can be acnuruteiy determined fromtc.11 
events, there is a greater degree of uncertainty about the .unutgemetas 7Z 
monosaccharides in the complementary fragments, arrangements of the 
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has beenlctuuT T ' "* ««*■ 01 «» >*« ^ 

has been mcluded m the present specification is solely for the pnrpose of providing a 

35 context for the present invention. It is no. to be taken as an admission mat Ly j7ot 
mese matters form par, of the prior art base or were common general knowledge in the 
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field relevant to the present invention as it existed in Australia before the priority date 
of each claun of this application. 

Summary of the Invention 

In a first broad aspect of the present invention, there is provided a method for 
charactenstng the structure or sub-struchtre of a glycan or glycan derivatives, 
comprising the steps of: "•• ••• 

Experimentally deriving the mass of an unidentified glycan molecule 
Comparing the mass of the unidentified glycan molecule with defined glycan 
structures to select candidate structures for the glycan molecule. 

Experimentally deriving the mass of fragments of the glycan molecule 
Theoretical fragmentation of the selected candidates 

Matching the mass of the fragments of the unidentified glycan molecule with the 
mass of fragments theoretically derived from the candidate structures 

Scoring to produce ranked confidence scores for each of the candidate structures 
by comparing the masses of the experimentally derived fragments with the masses of 
the theoretically derived fragments. 

Such a system is Ale to provide high throughput characterisation of glycans by 
20 mass ^spectrometry, and automafic, comprehensive and rapid characterisauon of gly J 

spectra, based on the mterpreters knowledge. 
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be replead' "f™ " * "» ™ ~° 

fragmentation step, or by obtaining Anther spectra. . 

Tie initial data set used for comparison with the experimentally detennined 
mass may consts. of only fragments that are the result of 1-c.eavage fiagmentaZ « 
may also mclude 2-cleavage events which era fotmed exclusively ftT gjy^di! 

s,ycosi<Hc deavase pattem is ~ 

^TT ft Ol,gOSa0Cllaride • ™- set of fragments provides 

enough data for the pnmary sequence scoring method to work. The increase in data set 
stze by adding more fragments is limited by refining the data set when required This 
way, by restaofing me *pes of fr^ents generated based upon the Jolts otZ 
sconng, it is possible to keep the data set size to a manageable size. 
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In order to characterise oligosaccharides there are two criteria, that need to be 
fulfilled; the sequence has to be identified and the linkage configuration and position 
has to be determined. Information about either of the two will provide valuable data. In 
a broad sense mass spectrometry will be able to predict the sequence information while 
linkage information will be more difficult to obtain with that technique. Cross ring 
cleavages or specific cleavages will have the potential to enable linkage position to be 
determined, while the linkage anomery is the parameter that is most difficult to obtain. 
On a computational basis, sequencing will be able to generate in silico glycosidic and 
cross ring fragments solely on a mathematical basis, but information about linkage 
anomery can not be included. The characteristic in a fragmentation spectra that has the 
potential of including some information about anomery is peak intensity. This is purely 
since it could be envisaged that different anomeric configurations may undergo 
fragmentation rearrangements by different kinetics. Fragment intensities will of course 
also depend on other parameters. 



Scoring methods will be designed to do the following: 

1. Provide a quality scoring based on the sequence allowing judgement of 
whether the sequence at least is correct. 

2. Provide a ranking between oligosaccharides based on sequence (glycosidic 
20 cleavages) 

3. Provide a ranking between oligosaccharides based on linkage position 
(cross ring cleavages) 

4. Provide a ranking between oligosaccharides based on other cleavage types 
including generic n-cleavages and other special cleavage types where a special 

25 cleavage is a cleavage that produces a fragment that is specific to that structure which 
may include the loss of water, for example. 

• • > . 

In a further aspect a scoring method is provided involving segmentation scoring 
that counts the number of possible conformations for an oligosaccharide identified by a 
set of matching oligosaccharide fragments. By deterniining how well a particular 
conformation is supported by the evidenced fragments, it is possible to gauge the 
quality of match for the particular structure. The score for ordered segments (that is 
where the segment it connects to is known) arising from 1- cleavage fragmentation is 
calculated to be the number of arrangements for each segment multiplied by the 
3 5 maximum number of points that each segment can attach to in its next segment. 
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in toe S ?1T 7 te """"^ " 46 nUmbra ° f of monosaccharides 

mtte segment, and su.ce the nex, segment fha, an ordered segment can attach to " 

Addtttonal mformation from 2- cleavages that span the boundary between the 

connected to by anchoring the 2- cleavage segment 

Further adjustment may take account of uneven sub-segment size and mnlK^t. 
independent cleavage events. multiple 

The fragment generation process will preferably omit redundant fragments and 

and data to be processed to make the method more efficient 

The identification of glycan differences often, indicators for recognition of 
glycosyafion differences which for exampie can occur On profeT or 

15 21 " ^ ^ 10 - ~afiCceU 

15 commumcauons, tmmunologloal recognifion and other significant characteristics. 

Brief Description of the Drawings 

Specific embodiments of the present invention will now be described by wav of 
example only and wifh reference to toe accompanying tewing : in which " 

< *JET< d TT, *! **" ° f to ' *"» Stra ^ S CB, ) 

< depth (E 2 )< depth (E 3 ) = depth (E 4 ); V ; 

Figure 3 tllustrates a non-disjoin. double non-reducmg end fragment- 

* *JK 8 ^ ° f "Bmen, mass 

Figure 5a is a graph showing a spectrum of peak masses of an experimental 
oUgosacharide musttafing toe fragment .signed t0 ~2 

Figure 6 shows an oligosaccharide structure of Example 1- 

Figure oT 6 7 " ' ^ Sh<>Wta8 ^ ^ ° f *• 0 «~aride anyone of 

Figures 8a to 8c are parts of a table giving the score, missed intensifies and 
gmupmg score for a number of oligosaccharide structures which potential^ toe 
oligosaccharide structure of Figure 6; 



Figure 9 shows an oligosaccharide structure of Example 2; 
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Figure 10 " ' Sh ° Wing ° f ^truotoe of 

# 

Figure n shows a table giving the seore, missed intensities and grouping seore 

into ^Zl^ 3 fra8mentaU ° n - 

into J^eT^ ' 2 : 0l6aVage &a ~ 0n " 

Figures 14 a, b and e illustrate the number of possible arrangements of 

rL'^rsr" il,usta,ing - — 
p—pX 1 ": sZTo^zr - " g ^ 

Detailed Description of Preferred Embodiments 

The characterising process is called Glycofragment Mass FW«-n™« 
GMF, and is outlined in Fig. 4: ' ' Fingerprinting or 

Experiments derive the mass of an unidentified glycan molecule 40 
Theoretical work begins with a database of defined glycan structures 41 whi^ 

mol ^wSZT?? faV ° IVeS "» ^ ° f «- glyean 

moSl ^ ^ 10 ^ "° s — * the gjyean 
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Further experiments derive the mass of fragments of the molecule 42 

MaX C l fragm r tatl ° n " ^ Perf0rmed ^ *» " 43. 
Ma ching 44 mvolves comparing the mass of the fragments of the unidentified 

^molecule with the mass of fragments theoretically Z^IT^ 

stmru SC T S ^ Pr ° dUCeS raDked SCOres for ^ch of the candidate 

^ numoer of different scoring regimes 
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If insufficient confidence is obtained in the highest ranked score, the process can 
be repeated 46 by taking into account more complex cleavage patterns in the theoretical 
fragmentation step 43, or by obtaining further spectra. 

Although mass spectrometry is the preferred method for measuring the mass of 
5 me glycan and fragmenting the glycan oth er methods, including chemical methods, 
could be used for fragmenting the glycan, although mass spectrometry will still be used 
for measuring the mass of the fragments. For example, glycan fragments may be 
sX^es 6XOglyCOSidaSeS ' Peri0date ^ acidic hydrolysis, and 

10 

Derive the mass of a glvcan molecule 40. 

A user will supply the mass of an unidentified glycan molecule from the results 
ot mass spectroscopy. 

15 Database of identified and characterised glvr.™ ch,,^,— ^ 

A number of suitable databases are available. For example, GlycoSuiteDB 
available at www.glycosuite.com" provides a database of identified and characterised 
glycan structures as does the database "Glycominds". The database can be in simple 
table form, or can be in a relational form to exploit other information that may be 
20 associated with glycan structures such as I biological source information 

Preliminary matching involves comparing the mass of the unidentified glycan 
molecule with the identified and characterised glycan structures to select candidate 
structures for the glycan molecule. ,• 

* 

25 Experiments derive the mare rftH emente of .h,. ^1^1. g 

^dividual oligosaccharides could be submitted to GMF after mass spectrometry 
under condtbons pricing fragment ions for example by tandem mass spectromebT 
or m source fragmentation, or alternatively oligosaccharide mixtures could be separated 
tnto mdtvdual components with separating methods hyphenated wffh mass 

30 specfrometiy. This includes techniques such as hplc and capillary CecfrophoZ 
Vartous fontsauon methods and conditions could be used. Multiple stages of mass 
^ctiometry coufd afso be used, where further fragmentation of Augment ions is 

35 Theoretical ffapmentatinn uafag the selects cmdidate 43 
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■ww T ° P ^ 6nt inVenti ° n ' " databaSe ° f ^ theoretical P^ 8 masses for all 
pos .ble glycan fragments along with their unfragmented molecular parent mass, is 

fd^fit d 1 8 ^ ° f fr ~ for ^tabase of 

identified and characterised glycan structures. 

In a refinement of the invention in order to match against and identify novel 
glycan ahuctores, which are no, already disclose* in existing databases, i, is e^Ty 

Z ? 7?* 3 ***** <*■» P- * ftagmentafiona offish 

i f , ^ P ° SSible " iS ** «* ™<* larger 

~T , e f 1 " SeC ° nd SearCh to wUch ' masses do 

no, satrsfactonly march to any known glycan flagmen, fingerprint . 

In order to obtain theoretioal peak masses for a Glycan anucture, an algorithm is 
m^od 7r te ~ ° f fM *• Ml " h "f-'-vagea fer a strule. The 

TZZ TTt fraSmentS * ^ ™ 3 ^bmatonai/pemmutton method. 
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Edge Selection 

■ 

m^^T" S iS C °7 0Sed ° f m ■"—«*■«- wtth m, 1 glycoaidic bonds 
exrshng between monosaccharides. In order to generate a mil se, of fragments for n- 

There exnrts C„ combrnauons of glycoaidic cleavage points (edges) for a n-cleavage 
ftagmentafion. In order to minimise size complexity an itorauve memod is nsed ,o 
generate all combinafions of edges. E is the k-subse, of me edges found in S k « £ 
any number up to (m-1). "nam a. k can be 

are J^TTZ" **T ' " "* ° f *" ° f «»» «™ edges 

are contoured. For the example shown in Fig. 1 mere are four edges E , , E , E , and 

B , . For a double cl.vage, k=2 and ^ k aubse, comprises all possible combtfi™ 
B?),'(E " !e ^ and (E jVeT 3 *™ C name 'l f I > B 2 ), (E i , E 3 ), (E 1. E 4 ), (E 2 , 

The edges within each k-aubset are then sorted according to depth which 
produces an edge vector. Edges tha, involve monosaccharides cloaer to Z Z^Z 
-4. are sorted with a higher rank man edges occurring a, a grea,er depth. £ ex^ 

to Fig. , iUuatoatea the depth of edges in a glycan- Lcmre S^ZL 
W. ) « depth (E2 ) < depth (E3 ) = depth (Et ). Edges E, and E2 are aelec entm 

Z Tll"* vZV" "~ ^ k " SUbSet ° f ^ is <W - — sorted 
the edge vector wrU be (U) since E , is closer ,0 the reducing end of the atnrchS 
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which is conventionally drawn on the right of the structure and is the end in which the 
hydroxide on C-l is not extended with additional monosaccharide units.' The ordering 
of edges is crucial to ensuring the accurate generation of fragments, asifris possible to 
choose particular cleavages to assign to the edges so that a- disjoin* fragment is 
5 generated. Thus with reference to Figure 2 if edges E 3 and E 4 are 'cut, two separate 
. fragments are created. ■» * «. 

Carbohydrate fragmentation patterns are discussed in the article "A Systematic 
Nomenclature for Carbohydrate Fragmentations to FAB-MS/MS Spec*, of 
Glycoconjugates" by Bruno . Domon and Catherine E Costello published in 
10 Glycoconjugate J 0988) 5: 397-409, the entire content, of which are incorporated 
hereto by reference. "Domon and Costello" notation is the accepted norm for labelling 
glycan fragment ions and is used herein. " ' * 

R^S^^antsmayomybemeresmtofparncularty^ 
For ,-cleavages, these are the Y, Z, X and certain speoial cleavage types. For n- 
15 cleavages reducmg end fragments only occur where there are no B,C orA cleavages 

772 V* "** 0CCUr ' F ° r ~* end fragment include 

mly) *~ A B/Y - «- » 

20 o„>v • T* redUC T "* ftagme, " S *" "*» *■ «**^*+*m. types that 
20 only mclude aamgle non reducing cleavage type. It is no* possible* create a fagment 

from more than one non reducing cleavage type ; 

' or 'J!""* 8 ** P0Sdble fa «»»P»«onany intensive. Where two 

or more cleaves ocour some of those 2-cleavages win produce fragments that will 
ah-eady have bean accounted for in the I-cleavages. For example 2 reducing ^d 
fragment produced when the two edges Ei and Et are cut is also prcducod as aZl^f 

fce reducmg end fragment that was a result of the Ei cleavage. Any fragments produced 
by a 2-c.eavage, that are also produced by a 1-cleavage do not' need to be emulated 
, Generally, when generating all n-cleavages, any fragments that could be produced by a 
m-cleavage ( where m < n ) are discarded. "uceooya 

can be! 0 ' ^ / T ahinaam ° f «*" °"tained in the edge selection step a fragment 
- be generated by applying a se, of fragment types to it Referring now to FigmL 3 

of edges formed from a 2-cleavage even, consisting of Edge A and Edge B. At Edge A 
the posstble cleavage types that could have occurred are all reducing and non-reducing 
end cleavage. At Edge B, only reducing end fragments could have occurred. Only 
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reducing end cleavages occur at Edge B as it is not possible to have two non-reducing 

T f lTT T: resulting * a ****** fragment A ° f womd 

m fact be xdenbcal to a single cleavage occurring at the edge B with the greatest depth. 
5 Cleavage Assig nment 

each 2rlT E CleaVa8eS t0 fi ~' ^ ^ " ° f — * °* 

T = the Set of n element cleavage type pennutations. 
for t € T, (where the size of t is a) 
10 V e e E: V t e T: Fragment = (t,e) ie -(fragment type, position) 

con,ain T mLet iC,ed " S ° *" f ° f cta W *- *- «* 

contatn more man one non-reducing end ftagment. Also, to avoid disjoin, fragments 

occurrmg, fte struoture is checked to ensure that the structure can support the fra^em. 
B^te cheekmg oeeurs to invalidate any reduoiug end fragments where for a reTcing 
15 end cleavage type assigned to a cleavage point, a traversal to me reducing end of the 
structure does not traverse any other cleavage points. Non-redubing end fragments are 
marked as mvahd if for any of me reducing-end cleave points' a traveSt Z 

. *" T " B ° leaVage **«■ occurs by starting a. the 

frucuue towards me reducing end, and marking any monosaccharide «* is fravLed 

wZh " S TT ' ** ^ ClraVage to «» *~ Any fragment 

winch causes the loss of branches containing marked monosaccharides due JZ A 
cleavage type is discarded. mA 

« vh ■ 2"" ^ aSSigm>ent of deava * e «yP« to cleavage points has been verified a 
25 vntud fragmentation occurs of the sfrucure. Ibis process involves removing brThes 
from the vnrud representation of the structure so that it win represent the sfructure of 

1 1 T ftB8,Bea, 1,35 bM " «— " *• — - b. obtained 

of ftat: 8 ^ ° f m — -arides, as well as any mass losses 

30 Domon + Costello notation and assigned to the fragment 

numbed T™** f " " ^^toriai problem. As the 

number of fragments dramatically increases as fire number of allowed cleavages 
mcreases, ., » no. feasible to generate all fragments a-prion. The method of the 7*2 

35 Z^Z ' Tt Perf0nned ^ 3 -»« — * ° f > ftagment 

35 are stored m a database. Typically me fragments for .-cleavages, and 2-cleavages from 

exclustvelyglycosidic cleavages wUl initially be used. 
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Matchin g dd 

Matching involves comparing the mass of the fragments of the unidentified 
glycan molecule with the mass of fragments theoretically derived from the candidate 



5 structures. 



Scoring 45. 

A user will supply a spectrum, which consists of pairs of m/z and intensity 

values. Each pair is called a peak. The peak mass is converted into a true mass by 

adjusting for charge state and adduct, and then compared against the set of theoretical 

fragments to find any fragments which have a mass within the tolerance range of the 

peak's true mass. The fragments are then collated according to the parent structure and 
scored. 



i 

There are two strands to the scoring process: one to determine the sequence 
quality of match of a candidate structure, and another to rank the candidate structures 
relative to each other. The family of algorithms for each scoring type are defined as 
quality and relative scoring methods respectively. Based on the combination of these 
two scoring methods, it is possible to determine the likelihood of a result structure 
being the one defined by the input spectrum, in regards of sequence or linkage 
20 mformation or both. 

Quality Scoring 

The quality score for a result encapsulates how well the fragments matched for a 
sequence define that sequence. For example, a result structure that matches only a 
single small fragment will be a low quality result, whilst a structure which has many 
fragments matched which are distributed over the entire structure will have a high 
quality score. One such quality scoring algorithm is a grouping algorithm. 

Group scoring derives the cleavage points from the fragment types, and obtains 
a number which represents how well the structure is characterised by the set of 
fragments associated with it. The best fragments used to characterise a structure are 
those resulting from 1-cleavages. If there are m - 1 unique cleavage points found in a 
glycan structure's associated 1-cleavage fragments for a glycan having m 
monosaccharides, then there is enough evidence in the fragments that the sequence of 
the structure is valid. 

Fragments resulting from 2-cleavages do not necessarily indicate the presence of 
a specific cleavage point in a structure. 1-cleavages are special as the presence of a 
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ZZZl Tl f T ** 3 fra8ma " °°— 4 "* «- point 

cltZZilZ* ,T " 3 ° f • . One of the 

^3 3 ; Se " e " if «*« point ha, 

5 ~T? b ^ ^ 1,16 — »- » overlap 

&H.rf„rl J , W Where ° f * d «Wve *« assigned! 

tor it to contain an equal amount of information. For this rea*™ 9 

cleavage fragment to fulfil the cleavage ooint Tf .1 8 

by a ,-cleavage fragment, it will ^ZZZ^ZZZZZ" "» "T - 
is derived using: fragment. The actual score assigned 

Equation 1 

■ » 

1 5 Score = (a - 0.25b) / (m - 1) 

where a is the number of cleavage points assioneH to 1 „i 

number of eleavage p„in«s ass^T ZZ£ £Z72£ " T 
c£va g e points are stio„g, y supported by it , ^ * s ^~f 

mformationti-omhomsmgleantiTuhtc^es **** - 

5 

Segmentati on Scornr 

possible'Tn^^ SC T g " 3 qUditatiVe SCOrin8 meth ° d * at «— *e number of 
possible conformations for an oligosaccharide identified by a set of ZtZ 

oligosaccharide fragments. Bv determining 



supported by the e*den«d fiLZTT^ T, * P8rtiCU!ai «■ 
me particular struck ^ ^ " " '"^ « "» ^ * - * «>r 



Simple segmentation 

35 „ oJ" 1 frasmentati ° n ttat on an oligosaccharide can be considered 

figure 12, consider an ohgosaccharide where a 1-cleavage 
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fragmentation occurs at a glycosidic bond. This fragmentation provides two pieces of 
evidence about the sequence. We can consider the fragmentation to have split the 
oligosaccharide into two parts - S' and S". Both S' and S" are segments of the 
oligosaccharide. A segment of an oligosaccharide is itself an oligosaccharide and is 
used to help measure the worth of evidence of a particular experimentally observed 
fragmentation. 

S" contains the reducing end of the oligosaccharide somewhere within its set of 
monosaccharides. All monosaccharides contained within S" can attach to the reducing 
end, or form chains of monosaccharides terminating at the reducing end 
monosaccharide. For the monosaccharides contained in S 1 to be attached to the reducing 
end, mere must be a single child monosaccharide connected .from S' to S" A 
monosaccharide in S» cannot be a child of a monosaccharide in S\ That is any 
monosaccharide in S» is closer to the reducing end than any monosaccharide in S".' 

This fragmentation provides three pieces of evidence about sequence for this 
1 5 oligosaccharide: 

all monosaccharides within S' must be connected to at least one monosaccharide 

in S f 

all monosaccharides within S» must be connected to at least one monosaccharide 

in S" 
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a single monosaccharide in S' is connected to another monosaccharide in S" 
These three pieces of evidence can be used to construct a set of oligosaccharides which 
can support a fragment with an identical mass to the found fragment. Further evidence 
such as the composition of the mil monosaccharide is also used to construct these 
oligosaccharides. 

Although it is algorithmically possible to create and sequence all possible 
structures which may support this fragmentation, it is only necessary to count the 
number of structures that will be generated. The total number of structures can be 
calculated by enumerating both the possible arrangements of a segment, and the 
number of ways that a particular segment may be attached to another segment. For S' 
we can calculate the number of possible arrangements of the monosaccharides 
contained in S' using the formula m -« . Similarly, S" can be arranged in n -' ways. 
To calculate the number of ways that S 1 can be attached to S", we consider the number 
of positions that the reducing end monosaccharide from S' can attach to a 
monosaccharide in S". Since S" comprises n monosaccharides, there are n possible 
35 attachment positions. In total, there are n"" 1 x m m ~ l x n possible arrangements of 
monosaccharides. 
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For example, with reference to Figures 14a to c there are.3 2 x 4 3 x 4 = 2304 
possible arrangements, illustrated in Figure 14c. 

Complex sep mentation 

Although segmentation is simple for a single fragment, multiple fragments 
significantly complicate the process of segmentation. 



2-cleavape fragmentation 

This is illustrated with reference to Figure 13. There are two cases for the 
10 segmentation resulting from a 2-cleavage event- 
Reducing end 2-cleavage event - When a reducing end 2-cleavage event 
fragment is used as evidence for segmentation, it segments the Oligosaccharide into 
three segments. S» and S'» can attach to any position in S', since the evidence is only for 
two glycosidic cleavages to have occurred. 

Non-reducing end 2-cleavage event - A non-reducing 2-cleavage event also 
creates three segments. Since no directional information is stored in this fragment, and 
the reducing end may be contained in S" or S'». S' may be attached to any of S" or S'» 
Similarly, S" may be attached to S' or S" and S'» may be attached to S' or S". There are 
9 possible ways that S', S» and S" can be arranged together. Let S' contain x 
monosaccharides, S" y monosaccharides, and S" z monosaccharides. The full structure 
contains m monosaccharides. There are 

9* x" 1 * y^x z-'x (yx 2 )x(xx 2 )x (x xy) possible arrangements of oligosaccharides to 
fulfil these conditions. 

25 Multiple in dependent 1 -cleavage events 

Multiple 1-cleavage events complicate the segmentation process by introducing 
nested segments. A nested segment is a segment created from a segmentation of an 
existing segment. The original segment will change from containing a set of 
monosaccharides, to a set of segments. Segments are created by considering 
30 fragmentation evidence from the non-reducing terminal monosaccharides and working 
towards the reducing end. As each piece of fragmentation evidence is applied to the 
segmentation, the segment containing the reducing end is further segmented A 
complex set of rules for creation of structures is created by this refinement of 
segmentation, resulting in a reduction in the number of possible structures that can be 
created with each successive fragment accounted for as evidence. Once all 1-cleavage 
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events have been accounted for, a set of segments with defined relationships between 
each other are found. 

Multiple 2-cleavap ;e events 

A set of segments along with information regarding^ which segments are 
definitely attached to other segments is created at the end of the previous stage. For any 
segments which contain more than one monosaccharide, further fragments are 
interrogated to find any further evidence of sequence. 2-cleavage reducing end 
cleavages are treated by intersecting the segments created by the 2-cleavage event with 
the existing segments. Other 2-cleavage events can only be relied on for the grouping 
of monosaccharides in the fragment. This grouping of monosaccharides is also treated 
as a segment, and intersected with the existing segments. Once all fragments have been 
used to create segments, the oligosaccharide is maximally segmented, i.e. all groupings 
of monosaccharides have been merged to produce the smallest groups of 
15 monosaccharides possible. 

► 

Calculating the score 

The score is calculated by calculating the score for the ordered segments, which 
in rum calculates the score for unordered segments. The ordered segments are segments 
where the segment that it connects to is known. Ordered segments arise from 1- 
eleavage fragmentation. To calculate the score for ordered segments, the number of 
arrangements for each segment is calculated, which is men multiplied by the niinimum 
number of points that each segment can attach to in its next segment. 

* 

» 

reducing end segment 

25 Sc °™ = mnredu n segmmij Score(segment)x minimum number of positions that segment 

* 

can attach at 

The score of each segment is calculated in one of two ways. If the segment has 
been sub-segmented by 2-cleavage fragmentation, a method detailed later is used If no 
further sub-segmentation has occurred, the score is calculated as the number of 
arrangements of monosaccharides in the segment. Since the next segment that an 
ordered segment can attach to is known, it is possible to know how many points that an 
ordered segment will attach to. With no additional information, the segment can attach 
to as many monosaccharides as there are in the next segment. Additional information 
from 2-cleavages that span the boundary between the two segments can reduce the 
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possible number of positions that the segment can be connected to by essentially 
anchoring the 2-cleavage segment. When there are no ordered segments identified, the 
entire oligosaccharide is treated as one big segment, and the score is calculated using 2- 
cleavage fragments if possible. 

arrangements(subsegments) * ^JJ^ score(s) x number of attachment points 
score(segment)- — . 

- * ■ ■ 

number of anchoring segments 

The number of arrangements of sub segments is given by the above formula. A 
further adjustment of the number of structures created has to be performed due to 
uneven sub-segment size. For sub-segments that are larger than a single 
monosaccharide big, the number of arrangements is increased based upon the number 
of sibling sub-segments. The number of attachment points is given by finding the 
smallest sub-segments of a sub-segment that the sub-segment has in common with its 
sibling segments. The number of anchoring segments is the number of sub-segments 
1 5 that the segment has grouped together with the segments from a sibling segment. 

■ 

i 

Contrasted Intensities 

In order to further discriminate between matches using the segmentation scoring 
method, the contrasted intensity score is used. The contrasted intensity score is applied 
in two stages. The first stage looks at the total intensity matched to glycosidic cleavage 
fragments for a match in comparison to the total intensity matched to glycosidic 
cleavages for the other candidate structures. The second stage compares total intensity 
matched to cross-ring cleavages, and other fragment types. 

25 Example 1 

We define the segments marked by S x , where x is any number, to be the 
segments directly resulting from the mapping of a fragment to the structure. 

Segments marked with M x , where x is any number, are. segments that are 
derived from the mapping of fragments to the structure as well as the intersection of 
30 different S x fragments. 

Consider the hexasaccharide shown in Fig. 1 5, where the structure mass, shown 
in a) and four fragment masses (b - e) have been found. Initially, no rules are known 
about the arrangement of monosaccharides in the structure 

a) 

35 With no fragments, the structure is segmented into a single segment containing 

all monosaccharides. 
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Rules 

(1) S={A,B ,C,D ,E ,F) 

(2) M =S 

* 

..... 

5 Arrangements 

— arrangements (M ) 
= 6 5 =7776 

» 

where arrangements is a function calculating the arrangements of a segment. 
The attach function (seen later) is the number of points that a segment can attach to 
another segment with. 

1 0 This is a purely mathematical number of arrangements of the monosaccharides 

in the structure, and does not accurately reflect reality, where the number of 
arrangements is limited by the number of possible, attachment points for each 
monosaccharide. 

» . 

15 b) 

A fragment resulting in the loss of monosaccharides B - F or alternatively the 
loss of A only is found. Fragments resulting in the loss of B-F and fragments resulting 
in the loss of A are complementary. The segments Mi and M 2 are created from the 
intersection of the new segments (Si and S 2 ) and the existing segment (S). 
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Rules 

(3) S 2 ={B ,C,D ,E ,F } 

(4) S t ={A} 

(5) S 2 ^S, 

(6) M l =sns l 

(7) M 2 =SDS 2 
Arrangements 

= arrangements [M 2 )X attach (M 2 ->M ,)x arrangements (M .) 
= 5 4 X IX 1=625 

c) 

Rules 

(8) S 3 ={A,B ,F] 

(9) S 4 =[C,D ,E } 

(10) S 4 -*S 3 

(11) M 3 =s 2 ns 3 

(12) M 4 =s 2 ns 4 
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Arrangements 

X attach (M ,)x arrangements (M ,) 

= 3 X2X2X1X1 = 36 

5 d) 

This is the final 1-cleavage cleavage event to be turned into a segment There is 

Z2. ^ but ^ rest of n has dl of its ~ ^ 

* 

« 

10 Rules 

(13) S 5 = {A,B ,C,D ,B } ' 

(14) S 6 =[F } 

(15) S 6 ^S 5 

■ • 

Arrang ements 

-arrangements (M )x attach (M 4 -, B )x arrangements (F )Xattach (F ->B ) 
^arrangements (B )xattach (B -> M ,)x arrangements (M ) ( } 

=3 2 X1X1X1X1X1X1=9 



e) 



The 2-cleavage event only contains information about the grouping of 
monosacchandes, and does not contain information about complement^ Jgmente 
As such, we can only add a single rule to the set of rules The ^ n f 
20 for M4 is calculated in (f). ° f OTan g em ^ 

Rules 

(16) S 7 =[C,D } 

* 

25 Arrangements 

-arrangements (M 4 )Xattach (M )x arrangements (F.)x attach (F ) 

f) 

M4 is s P Ut two segments M 5 and a segment containing only E There are 
30 two arrangements of this segment M 5 -»C, C-»Af , X ' 6 

. . . . , ocgmeni 5 ^.«- s . M 5 can only attach to E in a 

T L r B attaCh t0 M5 m m ° re P° sitions ' To ac «>^ for this, an 
adjustment for the number of attachment positions is used to modify the number of 
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♦ 

arrangements of M4. In this example, the number of arrangements of M4 is increased 
from 4 to 6. 

♦ 

Example 2 
a) 

Referring to Fig. 13 two single fragments have been found for this structure. It is 
split into three segments: S h S 2 and a segment containing only F. F can attach to SI at 
two positions (E,D), and S2 can attach to SI at three positions (A,B,C). The total 
number of arrangements possible is 1 08. 

* 

A 2-cleavage event is used to segment the structure from a). A resulting segment 
from this cleavage spans two 1 -cleavage segments. In this case, the 1 -cleavage 
segments are sub-segmented. Let the segment that this 2-cleavage would have created 

1. _ ni 



15 beS'. 



S'={B ,C,D } 

S^S.nS'^E ,D}n{B ,C,D)={D} 
S 4 =S 2 nS'={A,B ,c}n[B ,C,D}=[B ,c) 

Also, we know that S 3 must attach to S 4 . Because of this, Si can only attach to 
S 2 via B and C. If S, attached to S 2 via A, the rule governing S' would be violated. 
Since S 4 must attach to S 3 , the number of arrangements of S 3 is reduced so that this rule 
can be accommodated. There are now only 24 arrangements of monosaccharides 
supported by this fragmentation. The number of positions that Si. attaches to S 2 was 
reduced to 2, the number of arrangements of Si was reduced to 1 ( D must be used to 
attach to S 2 , and cannot attach to both E and S 2 ). Also, the number- of arrangements of 
S 2 was reduced to 6. 

20 Relative Scoring 

Relative scoring methods will allow for differentiation of results which have the 
same quality score. One method which can be used is a matched intensity scoring 
method. Matched intensity can also be further refined into matched sequence (only 
glycosidic cleavages) intensity and linkage information (cross ring, special cleavages 
25 with or without concomitant glycosidic cleavages) intensity. 

Matched intensities obtains the sum of intensities of all peaks which have 
matched with at least one fragment within a fragment subset (eg glycosidic, cross ring, 
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or both together). A peak matching with at least nnP fr^ < 

correct wiU have a greater number of ^ ZZL^l J ^ 
The matched intensity seore is particular* useM f^sC^T^ 
5 struchnes, which may otherwise have an «-J^St^TTJf 
mtensiry score wiU determine the quantity of di« ran cH «. Ched 
and a difference in score suggests ^Jl^ »~ — • 
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fragment types, antiVe~ "eTjn ^ ^ ■* — 
-fl the experimental da, set is JLT^fT.^^ 0 * "* ™ 
fragments can be founH tw m * • i. required information, i.e. no unique 

to Sprove T^Z 'T* -^es. In order 

process may only be f "" y ^ t0 be « «» 

sfructures bei^ fragm^A TT ^ " *° ° f — » 

8 n-agmented. A structure which has at l^ct 

matches with a peak true mass will have a set of 1 ^ Which 

fragment set is the set of fr n 7 7 fragments associated with it. This 

wirmespectil ^^: ^ *"* >~ ™ 

glycosidic cleavage tvnes nTu . ! * ^ f ° med from 

primary =^00^^^ Sue". ^ " * 

fragments is limited hy refining the da* set ZlZZIZ ^ 1" "** "°" 
B types of fragments generated based upon fteT^ZT * *" 

the data set size .0 a manageable size SC<m '' 8 ' " " P0SSMe t0 kee " 

GMF, no, aU of tire ^ ^ " ^ ^ ^ 

spectium, as tirey may not have the right luelT 7 Stm:ta ™ " 

detailed GMF can be nerfom,^ .7-TI orier to exploit this, a more 

devic, or generated on tire fly for dj£^^" ^ 

entire GMF solution snace for ft.,™ . ! v 01 Ma » a V *r «he 

. .. . oa ^ fcr fragments to be available in every GMF mien, n„ 
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As the data sets become more refined, and the possible solution set more 
relevant, the matched intensity score will increase. Initial data sets will contain generic 
fragments, and will not match more exotic fragments which may occur. However, these 
exotic fragments may not necessarily be useful in deterrnining the correct result out of a 
5 large result set. For example, the intensity of the peak matching the fragment may be 
very low, or the fragment occurs in many of the structures. As the result set is reduced 
in size the unportance of these fragments increases, and they play a very important role 
m the selection of the most probable candidate structure. 

10 Results 

Figure 5 shows a graph of peaks from fragmentation of a glycan structure 10 
Peak m/z 689.9 has been matched with two different fragments having the same mass' 
Further information is required to determine whether both the fragments that have 
matched, or a single one is the correct fragment 

15 

Example 1 

Figures 6 to 8 illustrate a first example. The oligosaccharide structure which is 
empirically fragmented is shown in Figure 6. Figure 7 shows its m/z spectra. Figures 
8a to 8c show a table of results illustrating how the method can . distinguish between 
wo isoforms of structure when the grouping score is the same by comparing the sum of 
the missed intensities with the first structure being the correct structure and having a 
lower total sum of missed intensities despite both structures having the same score of 
0.8 as determined by equation 1 . 

25 Example 2 

Figures 9 to 11 illustrate a second example. The oligosaccharide structure 
which is empirically fragmented is shown in Figure 9. Figure 10 shows its m/z spectra 
The first result on this table is correct as it has both a perfect grouping score and the 
lowest number of missed intensities. 

30 

It will be appreciated by persons skilled in the art that numerous variations 
and/or modifications may be made to the invention as shown in the specific 
embodiments without departing from the spirit or scope of the invention as broadly 
described. The present embodiments are, therefore, to be considered in all respects as 
35 illustrative and not restrictive. 
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